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ABSTRACT 

Characterization of small non-coding ribonucleic 
acids (sRNA) among the large volume of data 
generated by high-throughput RNA-seq or tiling 
microarray analyses remains a challenge. Thus, 
there is still a need for accurate in silico prediction 
methods to identify sRNAs within a given bacterial 
species. After years of effort, dedicated software 
were developed based on comparative genomic 
analyses or mathematical/statistical models. 
Although these genomic analyses enabled sRNAs 
in intergenic regions to be efficiently identified, 
they all failed to predict antisense sRNA genes 
(asRNA), i.e. RNA genes located on the DNA strand 
complementary to that which encodes the protein. 
The statistical models enabled any genomic region 
to be analyzed theorically but not efficiently. We 
present a new model for in silico identification of 
sRNA and asRNA candidates within an entire bac- 
terial genome. This model was successfully used to 
analyze the Gram-negative Escherichia coli and 
Gram-positive Streptococcus agalactiae. In both 
bacteria, numerous asRNAs are transcribed from 
the complementary strand of genes located in 
pathogenicity islands, strongly suggesting that 
these asRNAs are regulators of the virulence ex- 
pression. In particular, we characterized an asRNA 
that acted as an enhancer-like regulator of the type 
1 fimbriae production involved in the virulence of 
extra-intestinal pathogenic E. coli. 



INTRODUCTION 

The number of metabolic pathways in eubacteria known 
to be controlled by regulatory small RNAs (sRNAs) is 
growing. These pathways often regulate gene expression 
post-transcriptionally by modulating mRNA translation 
and/or niRNA stabihty through antisense mechanisms 
involving base pairing interactions with dedicated 
mRNA targets (1). Mechanistic studies revealed that 
sRNAs also modulate protein activity by sequestering 
them to modify their structures (2) or control the quality 
of the protein synthesis (3). Most of the characterized bac- 
terial sRNA genes have been found in the intergenic 
regions (IGRs) of the core genome; in mobile genetic 
elements, such as insertion sequences, plasmids and 
phages (4); or in pathogenicity islands (PAl) (5,6). 
Previous studies have shown that sRNAs can regulate 
both bacterial metabohsm as well as pathogenicity (7). 

Recent data from high-throughput sequencing of the 
transcriptome (RNA-seq) and tiling microarray analyses 
have demonstrated the expression of many complemen- 
tary sRNA/mRNA transcript pairs in Listeria mono- 
cytogenes (8), Helicobacter pylori (9) and Escherichia coli 
(10). These results highhght that the number of sRNA 
genes located at the same genomic locus as protein 
coding genes (CDS), but on the DNA opposite strand, 
was underestimated. The sRNA molecules encoded by 
these genes are referred to antisense RNAs (asRNA) or 
naturally occurring RNAs. It was deduced from these 
studies that the diversity of sRNAs is likely to be much 
greater than expected, most particularly for asRNA genes, 
which in turn raises a plethora of questions about their 
functions (11). Few recent studies have indicated that 
asRNA genes encoding molecules that are partially (12) 
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or fully complementary to a CDS (13) have a physiologic- 
al role but the contribution of asRNAs to regulation of 
metaboHsm and pathogenicity has not been studied exten- 
sively. RNA-seq and tihng microarrays represent signifi- 
cant technical advances for the identification of sRNAs 
because the whole transcriptome could be analyzed. 
However, both techniques have strong limitations, par- 
ticularly in terms of experimental costs and the cumber- 
some nature of the data analysis and experimental 
procedure, which includes the crucial choice of relevant 
strains and growth conditions. Thus, in silico methods 
remain of great interest for screening of a large number 
of genomes without high cost and time consuming tasks. 

Many methods for in silico identification of sRNAs 
exist, but only a few algorithms can efficiently predict 
sRNA gene loci in the full bacterial genome sequence 
(14). Different in silico methods based on comparative 
genomics (15-19), statistics/probability analyses (20-24), 
and RNA secondary structure analyses (16,25) have been 
developed but they vary considerably in efficacy. The most 
recent algorithms for identification of sRNA genes are 
combinations of several pre-existing independent 
methods, for increasing their sensitivity and predictive po- 
tentials. However, most of these sRNA gene finders were 
first designed for and mainly applied to Gram-negative 
bacteria and they require significant adjustments to 
analyze genomes of unrelated bacteria. Most of the 
methods based on comparative genomics to identify 
small (<500nt) conserved gene structures, including 
promoter sequences, were highly bacterial order depend- 
ent (15). Indeed, transcription promoters are highly 
diversified and DNA recognition consensus sequences 
among bacterial species were often divergent or not 
known. Only Rho-independent terminators (RlTs) identi- 
fication seemed to be a valuable search for building an 
almost general sRNA gene finder and can constitute the 
basis of a gene signature research algorithm. Restriction of 
the computational searches for novel sRNA genes located 
in the IGRs constitutes another important limitation of 
the current algorithms. Studies using machine learning al- 
gorithms [i.e. stochastic context free grammar (16), neural 
networks (20), boosted genetic programming (22), gapped 
Markov model (23) and support vector machine (24) 
methods] enabled the detection of new sRNAs in 
protein-coding regions but the number of putative 
asRNAs identified are variable between studies and 
some of these studies lacked of in vivo validation. 
Comparison of the data obtained by the application of 
these mathematical models with those recently obtained 
by RNA-seq or tilhng niicroarray analyses demonstrated 
that the efficiencies of these in silico analyses need im- 
provements. The defect of these methods to identify 
most asRNAs partially or fully overlapping protein- 
coding genes, probably related to their low efficiency to 
discriminate sequence conservations due to the presence of 
a protein coding sequence from conservations due to the 
presence of an asRNA gene. While these strategies are 
interesting, their limitations are inherent to RNA sec- 
ondary structure diversities that impaired the efficiency 
of the co-variance model, especially for unstructured 
sRNAs (16). Despite all efforts made, current methods 



could be perfected and a number of strategies remain to 
be tested. 

We report here the development and validation of a new 
in silico strategy, that successfully identifies known and 
new sRNA genes based on the analysis of the complete 
genome sequence of Gram-negative and Gram-positive 
bacteria, including those located in intergenic and CDS 
regions. Improvement of current RIT searches and covari- 
ation identification by our new algorithms enhanced 
sRNAs discovery. For example, analysis of the genomes 
of extra-intestinal pathogenic E. coli (ExPEC) and 
Streptococcus agalactiae, two opportunistic pathogens in 
which gene regulation undoubtly plays an important role 
in pathogenesis, led to the identification of numerous 
new sRNAs, including asRNA genes specific for the 
ExPEC strains or the Group B Streptococci. Transcription 
analysis of sRNAs located close to pathogenicity- 
associated gene clusters and functional characterization 
of two asRNAs suggested that they might control the ex- 
pression of pathogenicity-related genes in both bacteria 
which confirmed the efficiency of our new method. 

MATERIALS AND METHODS 

Genome and pathogenicity island sequences 

All genome sequences of E. coli and 5. agalactiae were 
obtained from the Genbank database (http://www.ncbi 
.nlni.nih.gov/genbank/). The PAI-1al862 of E. coli AL862 
strain was sequenced at the Pasteur Institute and was de- 
posited to Genbank under accession number GQ497943. 

Identification of RITs 

For Gram-negative bacteria, RITs were predicted with the 
RNAMotif program (26) by a slightly modified version of 
the previously described method (27). We used the perfect 
stem loop structure template as described, except that we 
permitted no more than one mismatch within the stem 
structure. We also used the same scoring formula, excepted 
that the AG°37 of the RNAiDNA hybrid duplex of the 
poly-uracil tail and its complementary genomic sequence 
were scored with Melting4 software, using nearest 
neighbor thermodynamic parameters (28). All candidates 
with a score greater than — 4.0kcal/mol were removed. 
For Gram-positive bacteria, Rho-independent terminators 
were predicted by TransTermHP (29). 

Bacterial strains and growth conditions 

AU E. coli strains (Table 1) were cultured in Luria Bertani 
(LB) or M9 supplemented with 0.4% of sodium pyruvate 
media. S. agalactiae NEM316 was grown in Todd Hewitt 
(TH) or RPMn640 medium supplemented with 0.4% 
glucose and 5% IM HEPES buffer. Antibiotics for 
plasmid selection were used at the following concen- 
trations: for E. coli, carbenicillin, 100|ig/ml, kanamy- 
cin, 50|ig/ml, and chloramphenicol, 12.5|ig/ml; for 
S. agalactiae, erythromycin, 5 |xg/ml. The 536 l^hfq:: 
KmFRT strain was constructed by the allelic exchange re- 
combination protocol using the thermosensitive plasmid 
pKOBEG-Apra (36). The 500 nucleotides adjacent to 



2848 Nucleic Acids Research, 2012, Vol. 40, No. 7 



Table 1. Strains and plasmids used in this study 



Name 



Description 



Genotype/Resistance" 



Reference 



Strains 



E. 


coli AL862 


Sepsis-associated ExPEC isolate 


afaS^ 


(30) 


E. 


coli 536 


Pyelonephritis-associated ExPEC isolate (06:K15:H31) 


pap^, fim^ 


(31) 


E. 


coli 536 Afim::cat 


Deletion of the full Jim gene cluster 


Cm" 


(32) 


E. 


coli 536 Ahfcj.-.KmERT 


Allelic exchange of the hfq gene with kanamycin FRT cassette 


Km" 


This 


S. 


agalactiae NEM316 


Human septicaemia isolate 




(33) 


E. 


coli TOP 10 


Laboratory strain 


fim' 


(34) 


E. 


coli TOPlOAhfq.-.KitiERT 


///^-deficient strain JVS-2001 


Km" 


(34) 


E. 


coli TOPlOAhfq.-.FRT 


///(/-deficient strain JVS-2001 with the FRT flanked kanamycin 


Km'^ 


This 






resistance cassette removed by action of the FLP flipase 










from pCP20 plasmid 







Plasmids 
pCP20 

pKOBEG-Apra 

pZE21-g//; 
pZE2R-g//; 

pZE21-null 
pZE2R-null 
pZE2R-fimR 

pZE2\-antifimR 

pZE2R-SQ18 

pXG-0 
pXG-10 

pXGfimD::gfp 

pXGgh.s0031::gf'p 

pTCV-erm-nPtet 



pTCV-SQ18 

pTCV-SQ485 

pTCV-SQ893 



Thermosensitive plasmid expressing the fip flippase gene Cb", Cm" (35) 

Thermosensitive recombination plasmid used for allelic pSClOl"', Apra" (36) 

exchange 

gfp gene under the control of the Plibio-i promoter ColEl, Km" (37) 

Replacement of the Plicio-i promoter from pZE2\-gfp by the ColEl, Km" (37) 

Pi constitutive promoter 

pZEl-gfp derivative expressing a non sense sRNA ColEl, Km" This study 

pZE2R-g//; derivative expressing a non sense sRNA ColEl, Km" This study 

Insertion oi fimR gene into the EcoRI/Xbal sites of the ColEl, Km" This study 

pZE2R-g//7 plasmid 

Insertion oi fimR antisense sequence into the EcoRI/Xbal ColEl, Km" This study 

sites of the pZE2\-gj'p plasmid 
Insertion of SQ18 gene into the EcoRI/Xbal sites of the ColEl, Km" This study 

pZE2R-g//) plasmid 

Luciferase-expressing plasmid pSClOl*, Cm" (34) 

Translational fusion of lacZ and gfp genes pSClOl*, Cm" (34) 

pXGlO derivative with a fimD::gf'p translational fusion pSClOl*, Cm" This study 

pXGlO derivative with a ghs0031::gfp translational fusion pSClOl*, Cm" This study 

Shuttle low-copy vector to analyze regulatory elements in pAMpi, Erm" S. Dramsi 

Gram-positive bacteria under the control of the constitutive 

promoter Ptet 

Insertion of the SQ18 sRNA gene into the BamHI/PstI sites pAMpi, Erm" This study 

of pTCVerm-Ptet plasmid. 

Insertion of the SQ485 sRNA gene into the BamHI/PstI sites pAMpi, Erm" This study 

of pTCVerm-Ptet plasmid. 

Insertion of the SQ893 sRNA gene into the BamHI/PstI sites pAMpi, Erm" This study 

of pTCVerm-Ptet plasmid. 



"Apra, Cb, Cm, Erm, Km were resistance to apramycin, carbenicillin, chloramphenicol, erythromycin and kanamycin, respectively. 



the 5' and 3' regions of the hfq gene were ampHfy and 
assembled with the Icanamycin FRT flanked cassette 
from the pKD4 plasmid by PCR prior to strain transform- 
ation (38). 

RNA sample preparation 

All cultures were estabhshed with a 1/50 dilution of an 
overnight culture, incubated at 37°C under shaking at 
140 rpm. Samples were prepared from cultures stopped 
during the exponential phase of growth ODgoo of 0.6 for 
E. coli or ODgoo of 0.4 for S. agalactiae, or stationary 
phase after 24 h for both bacteria. Total RNAs was 
isolated from E. coli strains with Trizol (Invitrogen), 
used according to the manufacturer's protocol except 
that the bacteria were harvested by centrifugation at 
4000g for 5min at room temperature, to prevent cold 
shock stress. Total RNAs was extracted from S. agalactiae 
with hot phenol as described (Pichon 2005 5). RNA 
samples were treated twice, with 30 units of DNase I 
(Amersham) for 90min at 37°C and extracted by 
phenol/chloroform treatment and precipitated in ethanol. 



The RNA was re-suspended in DEPC-treated water and 
checked for putative degradations on 2% agarose gel. 
Genomic DNA contaminations were analyzed by PCR 
amplification of the 5S RNA using the 5S.Fw and 
5S.RT primers. 

RACE experiments 

The determination of the 5'-end of sRNAs were done as 
previously described (39). 

Nested and classic RT-PCR 

Chimeric DNAs (cDNA) were synthesized from 5 ^g of 
heat-denatured total RNAs with 200 units of Superscript 
III reverse transcriptase enzyme (Invitrogen). For analyses 
of sRNA expression, the reaction was performed at 55°C 
for 1 h with 2 pmol of gene specific primer (Sigma Prohgo) 
(Supplementary Table SI) to maintain stringent condi- 
tions and synthesized strand specific products. For 
mRNA expression analysis, the reaction was performed 
at 42° C for 1 h with 200 ng of random hexamer according 
to suppher's protocol. Reactions were inactivated by 
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heating at 70°C for lOmin. The cDNA was ampHfied by 
PCR done with 0.4 units of Taq polymerase (QBiogen), 
100 nM of each primer pair (gene.RT and gene.Fw or 
gene. Nested and gene.Fw for nested PCR), 200 |iM 
dNTP and 2 of the RT reaction. The thermal cycling 
were 94° C, 3min, followed by 40 cycles of 94° C, 30 s; 
55°C, 30 s; and 72°C for 30 s. and final extension of 
72°C, 7min. PCR products were analyzed by electrophor- 
esis in 4% ethidium bromide-stained agarose gels. 

Northern blot hybridization 

Northern blot membranes were prepared and hybridiza- 
tion was carried out as described (5). Briefly, RNA 
samples were separated by urea denaturating polyacryl- 
amide gel electrophoresis and transferred to Zeta probe 
GT membranes (Biorad). Membranes were hybridized 
with ''^P 5'-end-labeled oligonucleotides in ExpressHyb 
(Clontech) and scanned with a PharosFX system (Biorad). 

Analysis of small RNA and mRNA interaction 

The pZE2R-null and pZE21-null plasmids were con- 
structed by digesting the pZE2R-gfp and pZE21-gfp 
plasmids with EcoRI (Invitrogen) and Xbal (Roche). 
The DNA fragments containing the kanamycin resistance 
gene and the origin of replication were separated by gel 
electrophoresis and extracted from the agarose with the 
Qiagen gel extraction kit. We treated 200 ng of the two 
cleaved plasmid DNA fragments with Klenow enzyme 
(NEB) for 1 h at room temperature, followed by 
re-circularization with T4 DNA ligase (Fermentas) and 
transformed in the TOPIO strain. 

For expression of the FiniR and SQ18 sRNAs in E. coli, 
we amplified the fimR gene from E. coli 536 and the SQ18 
gene from S. agalactiae NEM316 genomic DNAs by PCR 
using Taq DNA polymerase (MPbio) with cl.fimR.EcoRl 
and cl.finiR.Xbal or cl.SQlS.EcoRI and cl.SQ18.Xbal 
primers, respectively. The two PCR products were 
inserted to pCRll-TOPO plasmid (Invitrogen). The 
pCRII-/z'mi? or pCRII-SQ75 plasmids were digested with 
EcoRI and Xbal. The DNA band containing the sRNA 
gene was purified from the gel and ligated with pZE2R 
DNA digested with EcoRI and Xbal, with T4 DNA 
ligase. The ligation products were transformed in the 
TOPIO strain, generating the pZE2R-fimR and pZE2R- 
SQ18 plasmids. The pZE2l-antifimR plasmid was con- 
structed in the same way as pZE2R-fimR, except that we 
used the cl.antifimR. EcoRI and cl.antifimR.Xbal primers 
for PCR. 

The fimD::gfp and gbs0031::gfp fusion genes were ex- 
pressed by inserted the fimD and gbs0031 CDSs depleted 
of stop codons into the pXGlO plasmid as described (34). 
The DNA fragments containing the fimD and gbs0031 
CDSs were amplified with LA Taq (Takara) with 
fimD.Nhel and fimD.Mphl 1031 or gbs0031.NheI and 
gbs0031.Mphl 1031 primers, respectively. The other steps 
and Western blotting were done as described (34). 

For expression of the SQ18, SQ485, SQ893 sRNAs in 
S. agalactiae, we amplified the three sRNA genes from 
S. agalactiae NEM316 genomic DNAs by PCR using 
Taq DNA polymerase (MPbio) with cl.SQlB.BamHl 



and cl.SQ18.PstI or cl.SQ485.BamHI and cl.SQ485.PstI 
or cl.SQ893.BamHI and cl.SQ893.Pstl couple of primers, 
respectively. The PCR products were first cloned into the 
pCRll-TOPO plasmid (Invitrogen) and recloned into 
the BaniHI/PstI sites of the shuttle vector pTCV-erm- 
flPtet plasmid, giving the pTCV-SQlS, pTCV-SQ485, 
pTC'V-SQ893 expression plasmids. These vectors were 
introduced by electroporation in 5". agalactiae NEM316. 

Analysis of expression by quantitative real-time PCR 

Total RNAs were reverse-transcribed as described in the 
section on RT-PCR, except that 10 |ig of total RNA were 
used. All primers were designed with Primer3 (http:// 
www-genome.wi.mit.edu/cgi-bin/primer/primer3_www 
.cgi). We determined mRNA and 5 S RNA levels from 
cDNAs synthesized with random primers. The sRNA 
levels were analyzed with cDNAs synthesized with 
specific primers. All cDNA samples were analyzed using 
iQ SYBR green supermix (BioRad) according to manu- 
facturer protocol and were run on a MyiQ thermal cycler 
(BioRad) with the following thermal cychng conditions, 
95°C 5min, 40 cycles of 95°C, 30s; 60°C for 60s. All ex- 
periments were carried out with at least two duplicate 
RNA samples. The 5S rRNA was used as reference and 
the gene and relative level of expression between samples 
were calculated by the AACt method (40). 

Yeast agglutination, motility and biofilm assays 

AU assays were carried out with E. coli strains cultured in 
LB broth and incubated overnight at 37°C without 
shaking. The culture medium was eliminated by centrifu- 
gation and bacteria were washed once with IX PBS. Yeast 
agglutination assays and motihty tests were performed as 
described (41). Biofilm formation assays were conducted 
in polypropylene microtiter plates. Bacteria were grown 
statically in LB and M63 glucose media for 48 h, and 
biofilms were visualized by crystal violet staining as 
described (42). 

RESULTS 

Design and validation of an sRNA genefinder based on 
the identification of orphan RITs 

We hypothesized that the core prediction system for 
a versatile sRNA genefinder algorithm that predicted 
preferentially non-coding sRNAs should combine several 
functionahties. First, it should predict the signatures 
composed of recognition sites for sRNA-binding 
proteins, for example RIT. Second, it should be able to 
inspect the flanking nucleic acid sequences using com- 
parative genomic and RNA structure predictions plus a 
scoring method based on covariation analysis, to provide 
a strong phylogenetic evidence for the existence of RNA 
stems (2,14). 

The RIT site, which is often involved in the termination 
processes of sRNA genes in E. coli (~70%) and in other 
bacteria such as Staphylococcus aureus (5), was used as a 
starting point for our sRNA search model (Figure 1). By 
applying it to the genome of the extensively studied E. coli 
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Figure 1. UML activity diagram for our in silico sRNA prediction 
model. (A) The first part of tliis process involves tfie prediction of 
sRNA protein-binding sites (RIT prediction in this study) and extrac- 
tion of the flanking sequences. (B) Core software for sRNA analysis 
and discovery based on a combination of comparative genomics, RNA 
prediction and covariation analysis. 



MG1655, we detected 16 959 putative terminators with a 
AG°37 < -4 kcal/niol score. The 1504 RIT located close to 
the stop codon (from —25 to +60 nt) on the same DNA 
strand as a CDS were automatically removed from the 
data set. The remaining putative terminators and the 
200-nt upstream sequences were considered as sRNA can- 
didate signatures. Their sequence conservation was 
analyzed using FASTA 3.4 software (43) against 44 
complete genomes of Enterobacteria (Genbank database, 
24/07/2007). Insignificant hits with an e-value >0.0001 
were excluded. MASR software was used to transform 
FASTA pairwise ahgnments into multi-alignment. RNA 
structure predictions of sRNA signature candidates were 
done with the Mfold 3.2 program (44). The CSSR 
program, by combining MASR multiple alignments and 
Mfold predictions, detects the RNA structure conserva- 
tions and presence of covariations (see supplementary 
data for a description of MASR and CSSRTo identify 
the most probable sRNA genes, candidates were ranked 
according to their RIT scores (Supplementary Table S2). 

Our model identified sRNA candidates associated with 
an RIT within CDSs. However, the large number of can- 
didates identified in E. coli MG1655 (>2000 antisense 
and >3000 sense sRNA candidates) suggested that these 
included high number of false-positives. We therefore 



filtered-out sense and antisense candidates in which 
the AG°37 score of the RIT was less that — 8kcal/mol. 
Finally, we scored sRNA candidates from E. coli 
MG1655 on the basis of their RIT, which were weighted 
by the number of covariation pairs found by CSSR. 
Threshold values of — 4kcal/mol or — 8kcal/mol for the 
RIT score and a requirement for at least two covariations, 
including one in the RIT stem, led to the prediction of 
1867 sRNA candidates that could be classified into eight 
different groups according to their position relative to 
adjacent CDSs (Table 2). In order to maximize the predic- 
tion of non-coding sRNAs, small CDSs were tentatively 
predicted using Glimmer2 software (45). 

Efficiency of tlie in silico model 

We first tested whether the use of covariations efficiently 
selected true positive sRNAs and rejected true negative 
candidates by using our in silico model to analyze the 
101 known sRNAs from the E. coli MG1655 strain 
(Supplementary Table S4), which included 18 asRNAs. 
All the sRNA sequences were submitted directly to 
the core software by bypassing the RIT predictions 
(Figure IB). The core software identified 77 (92.7%) of 
the sRNAs located in the IGR and 16 (88.9%) of the 
asRNAs as putative candidates. The statistical significance 
of the covariation identified by the Covation Search in 
Small RNAs software (CSSR) was evaluated by shuffling 
the 101 sRNA multi-alignments using the Altschul and 
Erikson shuffle algorithm (25). In these conditions, the 
total number of covariations found by CSSR in sRNAs 
was 73.7% lower than for the unshuffled data set, suggest- 
ing that most of the predicted covariations were statistic- 
ally significant. 

We assessed the efficiency of our in silico model as an 
sRNA genefinder by its ability to re-predict known bona 
fide sRNAs with RIT in six complete genome sequences 
(Table 3). Globally, our in .silico model detected 
known sRNAs with efficiencies of 70.1% and 71.3% for 
IGR-located sRNAs and asRNAs, respectively. In the 
case of E. coli MG1655, among the sRNAs with a RIT 
that were not identified, rybB and rydC genes have a RIT 
with a loop size that exceeds the maximum length 
tolerated by our method. Other candidates among those 
not identified the rdlA, rdlC, sokA, sokC, sokE and sokX 
were all cis-regulatory sRNAs. We suggested that putative 
structural constraints were appHed to these sRNAs 
leading to the use of atypical RIT. The E. coli MG1655 
strain transcriptome was recently analyzed in an RNA-seq 
experiment and 5 out of the 10 newly confirmed sRNAs 
were re-predicted by our in silico analysis (47). Confirmed 
sRNAs from pubhshed RNA-seq analysis of S. aureus 
N315 were compared to our data and 62.5% of the 
transcribed sRNAs (with and without RIT) were 
re-predicted (48). Given that our in .silico model was able 
to predict candidates irrespective of their expression, we 
were able to re-identify four known sRNAs (RNAIII, 
Sau-02, Sau-30, RsaE) that were absent from RNA-seq 
data (48). 
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Table 2. Summary of sRNA candidates identified in silico 

Strain Disease IGR asRNA 5' asRNA 3' asRNA 5' & 3' asRNA 5' UTR 3' UTR sense RNA 



Escherichia coli 



MG1655 


L. S. 


195 


452 


74 


142 


73 


89 


199 


643 


UTI89 


Cys. 


199 


398 


66 


95 


77 


96 


170 


527 


536 


Pyl. 


191 


388 


66 


107 


54 


73 


140 


496 


AL862 


Sep. 


9 


6 


2 


2 


0 


3 


3 


4 


S88 


Men. 


212 


430 


63 


103 


85 


90 


154 


532 


Streptococcus 


agalacliae 


















NEM316 


Sep. 


41 


63 


12 


24 


6 


5 


21 


25 



IGR, intergenic region; asRNA, sRNA antisense to a CDS; 5' asRNA, antisense to the 5'-end of a CDS ; 3' asRNA, antisense to the 3'-end of a 
CDS; 5' UTR, 5' untranslated region of a CDS; 3' UTR, 3' untranslated region of a CDS. For classification of the sRNA candidates into one of 
these categories, the first nucleotide of the RIT was used as the position reference of the candidate. This nucleotide had to be on the opposite DNA 
strand, between nucleotides — 50nt to +15nt around the ATG codon (5' asRNA), from position +15nt with respect to the ATG codon to position - 
50nt near the stop codon (asRNA) or from -50nt to +15nt around the stop codon (3' asRNA). When candidates were on the same DNA strand as 
the CDS, the window around the first RIT nucleotide was<-100nt before the ATG codon (5' UTR), <+200nt after the stop codon (3' UTR) and 
from +50 nt after the ATG to -50 nt before the stop codon (seRNA). AH candidates outside a CDS not included in a previous category are referred 
to IGR candidates. All candidates had to have a RIT with a score of AG°37 < -4kcal/mol and at least two covariations had to be present in the RNA 
structure including the stem of the RIT. For asRNA and seRNA candidates, AG°37 had to be below -8kcal/mol. L. S., laboratory strain; Cys., 
cystitis; Pyl., pyelonephritis; Sep., sepsis; Men., meningitis. Only the PAI-Ial862 sequence of the AL862 strain was analyzed. 



Table 3. Efficiency of the in silico process for predicting previously known sRNAs in six bacterial species 



Gram 


Strains 


Total known sRNAs 


sRNA genes 


in IGR 


asRNA genes 


in CDS 








Known sRNA 


Success 


Known asRNA 


Success 








with RIT 


(%) 


with RIT" 


(%)" 




E. coli MG1655 


101 


60 


86.7 


5 


60 




S. typhimurium LT2 


79 


51 


70.6 


0 


NA 




V. cholerae Ol 


40 


31 


90.4 


9 


55.5 




P. aeruginosa PAOl 


24 


24 


66.7 


0 


NA 


+ 


S. aureus N315 


55 


38 


76.3 


1 


100 


+ 


L. monocytogenes EGD-e 


50 


27 


29.6 


10 


70 



■'The RITs of the published asRNA genes were not characterized by authors. 

The efficiency of sRNAs prediction was calculated from data for bona fide sRNA genes. Only sRNAs that had been experimentally validated by 
Northern blots, 5' RACE and RT-PCR were taken into account. We excluded unconfirmed sRNAs from RNA-seq or tiling microarray data and 5' 
or 3' UTRs from mRNAs. 
''NA, Not Applicable. 



Screening for new sRNAs from ExPEC Escherichia coli 
isolates 

Escherichia coli is a species encompassing a broad variety 
of commensal and pathogenic strains that have diverged 
due to a high rate of genetic exchange (49). Using an ex- 
haustive and hand-curated database of sRNA genes found 
in the genera Escherichia, we recently updated the anno- 
tation of known sRNAs in the genome of the MG1655 
strain (Supplementary Table S4). We also reported that 
these genes were structurally well conserved in the genome 
of 6 pathogenic and commensal strains recently 
sequenced, although their copy number may vary (49). 
These data suggested that unidentified sRNAs that are 
absent from the MG1655 strain might be involved in regu- 
latory pathways specific to pathogenic isolates. 

We thus focused our searches for sRNAs on ExPEC 
strains, a group of major human pathogens responsible 
for urinary tract infections, meningitis, sepsis, etc. (50). 
Despite extensive studies, no gene or pool of genes specif- 
ically hnked to extra-intestinal virulence has been 
identified in these strains. This strongly suggests that 



virulence results from multi-factorial processes depending 
on the expression of both core-genome and strain-specific 
genes (49). We thus investigated the possible role of 
ExPEC specific sRNAs in virulence control by applying 
our in silico model to the entire genomes of three clinical 
isolates (UT189, 536 and S88) which are associated with 
cystitis, pyelonephritis and newborn meningitis, respect- 
ively (49,51,52). We also analyzed the sequence of the 
tRNA''*'" inserted PAI from AL862 strain (PAI-Ial862), 
a sepsis isolate (30). 

The RlT-associated sRNA candidates from the whole 
genomes or PA1-Ial862 sequences were collected with our 
model and classified according to their genomic coordin- 
ates (Supplementary Table S3), as summarized in Table 2. 
In each genome, we identified more than 1500 sRNA can- 
didate genes. The number of putative sRNA genes located 
in the IGRs did not exceed 200 (~10% of all candidates), 
a finding consistent with other in silico searches (19). Most 
of these candidate genes were located in the core genome 
(~81.8% on average) rather than in PAIs (data not 
shown) suggesting that they may regulate the general cell 



2852 Nucleic Acids Research, 2012, Vol. 40, No. 7 



UTI89 



536 



B UTI89 



536 



MG1655 




S88 MG1655 




S88 



Figure 2. Comparative analysis of sRNAs identified by our in siHco model based on sequence conservation among ExPEC strains. Venn diagram 
representations of the number of sRNA predicted in IGR (A) and of asRNAs predicted in CDSs (B). 



metabolism (Figm-e 2A). We detected numerous asRNA 
among sRNA candidates (~40% of all candidates), par- 
tially or fully antisense to a CDS, that were dispersed 
throughout the genome sequences, including their PAIs 
(data not shown). The partially asRNA candidates 
(~15% of all candidates) overlaps either the upstream or 
downstream regions of a CDS, suggesting that they 
control the translation and/or stability of the complemen- 
tary mRNA. In the case of the 59 000 bp PA1-1al862 
sequence, 29 sRNA gene candidates were predicted, 10 
(34.5%) being asRNAs, a percentage similar to that 
found in other ExPEC genomes. As shown for MG1655 
analysis, many candidates were found in sense orientation 
within CDSs (~34% of all candidates). 

Given the large number of sRNA candidates, we 
focused on those genetically associated with clusters of 
genes known to be involved in extra-intestinal virulence, 
in particular the ExPEC-specific PAI-1al862 {E. coli 
AL862), PAI-n536 {E. coli 536) and the fim gene cluster 
encoding type 1 fimbriae {E. coli 536). Screening by RT- 
PCR analysis revealed that six out of the seven sRNA 
candidates from PAI-1al862 were transcribed: one candi- 
date was located in an IGR and five were asRNAs 
(Supplementary Figure SIA). We evaluated the sensitivity 
of our RT-PCR method by carrying out hemi-nested 
RT-PCR experiments (53) (Supplementary Figure S2A). 
This analysis did not confirm expression of the SQ24 and 
SQ27 asRNAs, both targeting a putative transposase 
CDSs (Supplementary Figure S2B). Expression and size 
of the two of four remaining sRNAs was analyzed by 
Northern blot due to their co-localization with pathogenic 
factor genes (Figure 3A). The same transcription analysis 
was carried out for 10 sRNA candidates from the genome 
of the E. coli 536 strain, including nine candidates located 
in PAI-II536 sequence and 1 in the /zw gene cluster: two 
candidates were located in IGRs and 8 were asRNAs. 
AU candidates were expressed in our growth conditions 
as shown by our expression screening by RT-PCR 
(Supplementary Figure SIB) associated with hemi-nested 
RT-PCR performed to confirm the specificity of RT-PCR 
reactions for aU sRNAs (data not shown). Northern blot 
analyses of several candidates were done to confirm size 
and expression of selected relevant sRNA (Figure 3B). 

Comparative sequence analysis by our in silico model 
showed that aU but one of the 14 vahdated sRNAs of 



E. coli AL862 and 536 were frequently found in the 
genome of sequenced ExPEC isolates but not in other 
E. coli pathotype strains. The remaining SQ8017 sRNA 
was located in the fim gene cluster encoding the 
virulence-associate type 1 fimbriae present in almost all 
commensal and pathogenic strains. Most of the new 
sRNA genes identified in this study are asRNAs genetic- 
ally associated with a cluster of genes involved in ExPEC 
pathogenicity which suggests that they may be involved in 
virulence control (Table 4). Data for other expressed or 
not tested candidates are shown in Supplementary Data. 

The FimR asRNA from E. coli 536 up-regulates the 
expression of type 1 fimbriae 

In E. coli, type 1 fimbriae play a role in the development of 
urinary tract infections by mediating adhesion to specific 
receptors on the uroepithelium. During the pathogenesis 
of cystitis, type 1 fimbriae promote the invasion of bladder 
cells and the formation of intra cellular communities (54) 
but they are also involved in biofilm formation (55). The 
fim gene cluster is composed of nine genes (Supplementary 
Figure S4) whose expression is controlled by phase vari- 
ation and various regulators. As SQ8017 asRNA and 
fimD CDS are located in the same genomic locus, we 
hypothesized that this asRNA controlled the expression 
of the fim gene cluster and we therefore renamed it 
FimR. Mapping of the transcription start site of fimR by 
5' RACE was determined at position T4852969 in the 
sequence of the E. coli 536 strain (Table 4). Analysis of 
fimR promoter region revealed the presence of a putative 
promoter. The 'AA' tract from the -35 box, the in- 
variable C-residue from the -10 box, the 17 bp spacer, 
the 6 bp discriminator sequence and the -1 T-residue 
were observed, indicating such prediction may be 
reliable. Thus, it suggested that FimR expression is 
controlled by environmental stimuli (56). Given the 
position of fimR promoter and RIT, the calculated RNA 
size was ~440nt compatible with the ~410nt long RNA 
observed by Northern blot (Figure 3). 

Type 1 fimbriae mediate adhesion to mannose- 
containing receptors, a biological trait quantified in vitro 
with the yeast agglutination assay (57,41). The specificity 
of the assay for evaluating the expression of the fim gene 
cluster of E. coli 536 was confirmed with a 536 Afim 
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Figure 3. Northern blot analysis of some sRNAs from E. coli AL862, 536 and S. agalactiae NEM316 strains. Expression analysis of 7 sRNA 
candidates co-localized with virulence factors (see Table 4) and identified in (A) E. coli AL862, (B) E. coli 536 and (C) S. agalactiae NEM316 strains. 
Expression was analyzed in two phases of growth (E, exponential; S, late stationary) in LB and M9 + 0.4% pyruvate (M9py) media for E. coli or TH 
and RPMI1640 + 0.4% glucose media for S. agalacliae. Expression of the constitutively transcribed 5S ribosomal gene was used as loading control. 
The C0465 sRNA which is expressed only in early stationary phase in E. coli MG1655 strain was used as a negative expression control (46). Notes, 
ig, sRNA gene located in the IGR; as, sRNA gene located a position antisense to a CDS (asRNA). Black arrows indicated hybridized sRNA 
molecules. 



mutant that does not agglutinate. We tested our hypoth- 
esis by constructing derivatives of strain 536 over- 
expressing FimR or a FimR antisense sRNA (antiFimR) 
and assessing the yeast agglutination titer. The expression 
of antiFimR should inactivate the FimR regulation 
pathway by competing with FimR mRNA substrate. 



The primary transcript of the fimR gene including its 
RIT was cloned under the control of the P;^ proinoter of 
pZE2R-g/;; to give the pZE2R-fimR plasmid. We also con- 
structed pZE2l-antifimR by cloning in antisense the same 
primary transcript under the control of the Pueto-i 
promoter. These fimR, cmtifimR and mock plasmids were 
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Table 4. List of validated sRNA genes located close to virulence-related genes 

Candidate sRNA Origin Loc.'' S'-end'' S'-end" Type"^ Target Target function O. g.'' ExPEC Score*" N/ 

genes' specific?'^ kcal/mol 



Escherichia coli 



SQ8164 


IntP4R 


E. 


c. 


536 


PAI-II 


4 735 462 


4 735 232 


asRNA 


intP4 


PAI DNA mobility 


< 


> 


No 


10/ 


-26.28 


SQ7560 


PrfR 


E. 


c. 


536 


PAI-II 


4 747 389 


4 747 630 


asRNA 


prfF 


Adhesion 


> 


< 


Yes 


3/- 


■12.64 


SQ7575 


HlyR 


E. 


c. 


536 


PAI-II 


4 763 726 


4 763 963 


asRNA 


MyA 


Hemolysis 


> 


< 


Yes 


21- 


■5.76 


SQ7606 


HaeR 


E. 


c. 


536 


PAI-II 


4 783 731 


4 783 731 


asRNA 


ECPJSSO 


Filamentous 


> 


< 


Yes 


91- 


■6.52 






















haemagglutinin 












SQ8017 


FimR 


E. 


c. 


536 


Core 


4 852 969* 


4852518 


asRNA 


fimD 


Adhesion 


< 


> 


No 


15/ 


-8.49 


SQ109 


AfaR 


E. 


c. 


AL862 


PAI-I 


56 564* 


56 332 


IGR 


afaS 


Adhesion 


> 


< 


Yes 


21- 


■5.2 


SQ19 


IntR 


E. 


c. 


AL862 


PAI-I 


58 845 


59 076 


asRNA 


Int 


PAI DNA mobility 


< 


> 


No 


12/ 


-14.94 


Streptococcus agalactiae 


























SQ18 


SQ18 


S. 


a. 


NEM316 


Core 


47 857* 


47 734 


asRNA 


ghsOOSl 


Surface exposed protein 


> 


< 


N.A. 


3/- 


■10 


SQ340 


SQ340 


s. 


a. 


NEM316 


PAI-X 


1 163 702* 


1 163 779 


IGR 


ghslUS 


Transposase of TnGBS2 


> 


< 


N.A. 


3/- 


■10.5 


SQ893 


SQ893 


s. 


a. 


NEM316 


Core 


13 00 661 


1 300 360 


IGR 


ghsl263 


Fibronectin binding protein 


< 


> 


N.A. 


3/- 


■4 


SQ407 


SQ407 


s. 


a. 


NEM316 


PAI-XII 


1 350 419 


1 350 658 


asRNA 


Lmh 


Laminin binding protein 


> 


< 


N.A. 


11/ 


-11.5 


SQ485 


SQ485 


s. 


a. 


NEM316 


Core 


1 655610 


1 655 852 


asRNA 


ghsl588l 


Putative ABC transporter 


> 


< 


N.A. 


91- 


■10.3 




















ghsl589 














SQ1004 


SQ1004 


s. 


a. 


NEM316 


PAI-XIII 


2 052 153 


2 052 383 


IGR 


ghsl987 


Streptomycin resistance 


> 


< 


N.A. 


3/- 


■7.6 



^Localization of the sRNA gene. Core, core genome; PAI, pathogenicity islands. 

''The 5'-end of the sRNA candidate is arbitrarily located 200 bp upstreain from the first nucleotide of the predicted RIT. An asterisk indicates the 5' 
triphosphates RNA end determined by 5' RACE. The 5' ends of SQ109 (£. co/i AL862) and SQ340 (S. agalactiae NEM316) sRNAs were determined 
in another study (CP., personal communication). 

''The 3'-end of the sRNA candidate is defined as the last nucleotide of the RIT poly-uracil tail. 
''Type of sRNA candidate gene locus. IGR, intergenic region; asRNA, sRNA antisense to a CDS. 

"Antisense sRNA predicted target mRNA. The sRNA genes located in an IGR may regulate adjacent genes by an antisense mechanism. 
"^O. g.. Orientation of genes (order sRNA/mRNA). 

^Specificity was determined by FASTA analysis against the Genbank database. 
''N, number of covariations identified/RIT score in kcal/mol. 
E.c, Escherichia coli; S.a., Streptococcus agalactiae 



Table 5. Yeast agglutination assays for E. coli 536 derivatives 



Strain 


Yeast agglutination 




titer 


536 + pZE2R-null 


1/16 


536 + pZE2R-/7»7« 


1/64 


536 A/m.-.ca; + pZE2R-null 


NO 


536 Afim::cat + pZE2R-fimR 


NO 


536 + pZE21-null 


1/16 


536 + pZE21-a«//^m7^ 


1/4 


536 A/m.-.ra^ + pZE21-null 


NO 


536 Afim::cat + pZE2 1 -cmtifimR 


NO 


536 


1/16 


536Ahfg::KmFRT 


NO 



The level of expression of type 1 fimbriae was assessed in E. coli 536 
wild type and mutant strains expressing the FimR sRNA, the antiFimR 
sRNA or mock plasmids. No 536 Afim strains agglutinated yeasts 
indicating that the agglutination phenotypes resulted from the expres- 
sion of type 1 fimbriae. NO: not observable. 



introduced into the 536 and 536 Afim strains. As expected, 
FimR and antiFimR over-expression in E. coli 536 sig- 
nificantly modified tlie agglutination titer (4-fold increase 
and 4-fold decrease, respectively; Table 5). These findings 
indicate that FimR upregulates the production of type 1 
fimbriae. 

FimR asRNA binds the fimD mRNA and positively 
regulates type 1 fimbriae expression 

We assessed the putative base-pairing interaction of 
FimR and fimD mRNA using a translational control 



and target recognition system (34). A translational 
fusion of fimD and gfp genes was constructed by fusing 
the full stop-codon-less fimD CDS to the ATG-less gfp 
gene froin pXGlO plasmid. Expression of the fimD:: gfp 
fusion was monitored by quantitative RT-PCR and 
Western blot in E. coli TOP 10 (a Afim strain) harboring 
pXGfimD::gfp target plasmid or pXG-0 (no target 
control) and either pZElRfimR or pZE2R-null plasmids 
(Figure 4). Comparison of the relative levels of expression 
of fimD::gfp mRNA in pZEm-fimR and pZE2R-null 
bearing strains showed that FimR over-expression was 
associated with a 8-fold increase of the amount of fusion 
mRNA (Figure 4A). Western blot experiments with 
antibodies directed against the GFP protein revealed 
a 2-fold increase in FimD::Gfp protein expression, con- 
sistent with the transcriptome analysis (Figure 4A). 
Accumulation of the fimD::gfp and FimR transcripts 
strongly suggested that these RNA molecules may be 
stabilized when co-expressed (Figure 4A). A post- 
transcriptional regulation of fimD mRNA by FimR 
likely occurs through a putative antisense base-pairing 
between the two RNA molecules. 

We investigated the role of FimR in vivo by carrying out 
a more detailed analysis of expression of the fimBE and 
fimAICDFGH operons and of FimR asRNA of E. coli 536 
carrying pZE2R-fimR, pZE2\-antifimR, or mock plasmids 
by quantitative RT-PCR. Over-expression of FimR from 
a multicopy plasmid (~17 copies per chromosome equiva- 
lent) increased 2.34-fold the expression of fimB to H 
(Figure 5A). This result suggests that FimR positively 
regulates not only fimD, but also of the entire fim gene 
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Figure 4. Over-expression of FimR and SQ18 antisense sRNAs regulates the Jim D and ghsOOSl target genes, respectively. (A) Analysis by Western 
blot and quantitative RT-PCR o( gfp and FimR gene expression in E. coli strain TOPIO harboring pZE2R-/;))i^ or pZE2R-null plasmids combined 
with pXG-0 (no gfp target control) or pXGfimD::gj'p target expression plasmids. The four isolates were cultured in LB medium at 37°C until they 
reached an ODf,oo of 0.9. Quantitative expression of the gfp fusion gene was normalized to 1.0 for the TOPIO + pZE2R-null + pXG/zmZ).'.g//) strain. 
FimR expression was normalized to 1.0 for the TOP10 + pZE2R-y7;))i^ + pXG-0 strain. (B) Western blot and quantitative RT-PCR analysis were 
performed as described in (A) but in a A/)/(/ context. Asterisks indicate a significant difference between mean values in unpaired r-tests (P <0.0\). 



cluster. This hypothesis was confirmed by analyzing the 
relative expression level of fim genes in strain 536 which 
carries pZE2\-antifimR. The antiFiniR over-expression 
decreased 4. 1 8-fold fim expression to reach a value lower 
than that obtained with mock plasmid (Figure 5B) 
indicating that FimR inhibition down-regulated fim gene 
expression. Furthermore, yeast agglutination assays with 
E. coli 536 + pZE2R-fimR cultured in human urine for 
24 h showed that FimR increased the agglutination titer to 
the levels found with bacteria grown in LB medium (data 
not shown). It is thus hkely that FimR controls type 1 
mediated adhesion in vivo during host colonization. 

Hfq is required for flmD/FimR base pairing 

About 40% of the known sRNAs from E. coli require 
the Hfq protein to interact with their targets. Since Hfq 
contributes to the virulence of the ExPEC E. coli UTI89 
strain (57), we investigated the requirement of this protein 
for FimR regulation in E. coli 536. We investigated the 
requirement of Hfq protein for FimR/fimD interaction by 
introducing the pXGfimD.-.gfp or pXG-0 plasmids into 
the TOPIO Ahfq.:ERT strain harboring either pZE2R- 
fimR or pZE2R-null plasmids. In contrast to the vari- 
ations in gene expression observed in TOPIO cells, quan- 
titative expression analysis of fimR and gfp genes in 
TOPIO Ahfq.:ERT revealed no significant differences in 
either the RNA or protein levels in the presence or 



absence of FimR (Figure 4B). The loss of FimR- 
dependent regulation indicated that the Hfq protein was 
required for the binding of FimR to fimD::gf'p mRNA. 

We investigated the role of Hfq in vivo by constructing 
the E. coli 536 Ahfcj::KmFRT strain and assessed adhesion 
mediated by type 1 fimbriae with a yeast agglutination 
assay. As expected, loss of hfq expression induced the 
loss of visible agglutination, suggesting that fewer type 1 
fimbriae were produced in the hfq' mutant (Table 5). Next, 
we assessed the relative expression levels oi the fim BE and 
fimAICDFGH operons and of FimR asRNA of E. coli 536 
Ahfq.:KmERT by quantitative RT-PCR. As expected, 
loss of hfq expression decreased of fimBE and 
fimAICDFGH mRNA production by an average ~4-fold 
and that of FimR asRNA by ~6-fold. The fim A gene 
encoding the major structural subunit of type 1 fimbriae 
(~1000 to 10000 monomers per fimbriae) was impacted 
more severely and decreased ~7-fold. Taken together, 
these results suggest that Hfq regulated type 1 fimbriae 
synthesis by mediating base pairing of FimR with fimD 
mRNA. 

The FimR regulon controls biofilm development 
and bacterial motility 

We checked whether the expression of fim genes was 
linked to FimR regulation and controlled virulence by 
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Figure 5. FimR sRNA up regulates type 1 fimbriae gene expres- 
sion in vivo. Quantitative real-time RT-PCR analysis of expression 
of the fimBEAICDFGH gene cluster was performed in (A) E. coli 
536 + pZE2R->m« relatively to E. coli 536 + pZE2R-null, (B) E. coli 
536 + pZE2\-a!itifimR relatively to E. coli 536 + pZE21 -null and 
(C) 536 Ahfq::KmFRT relatively to 536 strains, cultured in LB 
medium statically at 37°C for 24 h (stationary phase). 



investigating various timbriae-associated plienotypes in 
E. coli 536 expressing the fimR and antifimR genes. 

Tlie adhesion mediated by type 1 timbriae is an import- 
ant factor in biofilm formation (55). As FimR enhanced 
type 1 timbriae production, we investigated the effect 
of FimR on biofilm formation for E. coli 536 deriva- 
tives carrying pZE2R-fimR, pZE2l-antiftmR, or mock 
plasmids. In our conditions, the strains that expressed 
the pZE2R-fimR or the mock plasmids displayed similar 
levels of biofilm formation whereas the E. coli 536 + 
pZE21-antiftmR isolate formed no detectable biofilm 
(data not shown). These observations suggest that FimR 
is required for biofilm development. 



The productions of type 1 fimbriae and flagella have 
been shown to be co-regulated in various pathogenic 
E. coli isolates (55). We therefore analyzed the relation 
between FimR and motility by performing motility tests 
on various E. coli 536-derived strains. Compared to a null 
plasmid-bearing strain, motility was unaffected by the 
over-expression of FimR but significantly decreased by 
over-expression of antiFiniR, resulting in virtually 
non-motile bacteria (data not shown). Thus, under labora- 
tory growth conditions, /zmi? expression is hnked to type 1 
fimbriae-mediated biofilm formation, and bacterial 
motility; two phenotypes known to be important in the 
urovirulence of ExPEC strains. 



Identification of sRNAs from S. agalactiae 

The Gram-positive bacterium S. agalactiae (also referred 
to as Group B Streptocccus, GBS) is a major cause of 
bacterial sepsis, pneumonia and meningitis in newborns 
and is also responsible for pregnancy-related morbidity 
(58). As our in silico model is based on the recognition 
of RIT-associated signatures found in both Gram- 
negative and Gram-positive bacteria, we assessed 
whether our program was efficient for predicting 
asRNAs also in Gram-positive bacteria. We assessed its 
efficiency by searching sRNAs in 5. agalactiae strain 
NEM316. AU steps of the process were identical to those 
used for E. coli except the foUowing modification: 
TransTerm HP was used to predict RITs and comparative 
genomics analyses were carried out with a database of 
Lactobacillale genome sequences (Genbank release of 
07/06/2008). The data collected from our in silico search 
revealed the existence of 197 sRNA candidates with genes 
located in the IGRs while others were partially or fully 
antisense to CDSs (Table 2). In addition, some candidates 
were located upstream or downstream from a CDS and 
were putative mRNA encoded regulatory elements 
(e.g. Riboswitch). Interestingly, as in the E. coli analysis, 
sense RNA candidates were also predicted. 

The genes of sRNA candidates were distributed 
throughout the genome and we analyzed by RT-PCR 
the expression of 30 out of 197 sRNA candidates 
located both in the core genome and PAIs. The expression 
of the TmRNA and 5S sRNA genes was used as positive 
controls. The analysis revealed that 26 out of the 30 pre- 
dicted sRNA candidates were expressed thus demonstrat- 
ing the versatility and efficiency of our in silico model 
(Supplementary Figure SIC). 

To confirm the RT-PCR results, we further 
characterized by Northern blot analysis with ^^P labeled 
ohgonucleotides the 26 RT-PCR positive sRNA candi- 
dates. Ten candidates gave a strong hybridization signal. 
The absence or weak signal obtained for the other candi- 
dates may be due to lower sensitivity of the Northern blot 
technique compared to RT-PCR (data not shown). 
The SQ18, SQ485, SQ655 and SQ893 sRNAs gave 
multiple bands suggesting a cleavage by ribonucleases 
or a transcription initiated from multiple promoters 
(Figure 3, Supplementary Figure S5). Four validated 
sRNAs were found to be located close or antisense 
to CDS involved in the pathogenicity of S. agalactiae 



Nucleic Acids Research, 2012, Vol. 40, No. 7 2857 



(Table 4). Comparative genomic analysis using FASTA3 
indicated that none of the sRNAs described here were 
present in sequenced strains of the phylogenetically 
related pathogen Streptococcus pyogenes and that none 
of the sRNAs previously described in S. pyogenes were 
present in S. agalactiae, suggesting that these molecules 
display a high degree of species specificity in the genus 
Streptococcus. However, as recently reported, one of our 
sRNA candidates (SQ517) has an ortholog (csRNA12) in 
Streptococcus pneumoniae (59). 

The SQ18, SQ485 and SQ893 sRNAs from S. agalactiae 
NEM316 modulate expression of adjacent genes 

As shown for the ExPEC strains, some sRNAs were found 
to be near virulence-related gene clusters. So we 
investigated whether the SQ18 and SQ485 asRNAs and 
the SQ893 sRNA over-expression regulated the expression 
of other genes in the 5. agalactiae NEM316 strain. The 
primary RNA transcripts of adjacent antisense genes to 
SQ18, SQ485 and SQ893 sRNAs were determined by 
searching in silico for putative promoters and terminators. 
This analysis revealed that the adjacent mRNA transcripts 
of ghsOOSl, ghsl588 and gbsl263 were putative antisense 
targets of SQ18, SQ485 and SQ893 sRNAs, respectively. 
To test these hypotheses, we cloned each of the three 
sRNA genes downstream the strong promoter P?e? in 
the shuttle vector pTCW-erm-^lPtet, giving pTCV-SQ18, 
pTCV-SQ485 and pTCV-SQ893 plasmids. These plasmids 
were introduced into the 5. agalactiae NEM316 strain and 
the expression of the putative target genes was analyzed by 
qRT-PCR (Figure 6A). Over-expression of the SQ18 
asRNA and the SQ893 sRNAs significantly decreased 
the levels of their respective target mRNAs gbsOOSl and 
gbsl263, suggesting that both sRNAs act as negative regu- 
lators. In contrast, over-expression of the SQ485 asRNA 
led to an increase in the amount of gbsl588 mRNA, sug- 
gesting that this asRNA acts as a positive regulator 
(Figure 6A). 

The SQ18 asRNA from S. agalactiae NEM316 
down-regulates expression of the Sip gene by an 
antisense mechanism 

A translational control and target recognition system (34) 
was used for investigating the putative base pairing 
between SQ18 asRNA and gbsOOSl mRNA which 
encodes a surface immunogenic protein (Sip) that elicits 
protective immunity against group B streptococci (60). 
We first characterized the 5'-end of the primary transcript 
of SQ18 by 5' RACE. The 5' triphosphate end was 
determined at 647857 and was associated with a putative 
CT^ promoter (Table 4). The SQ18 gene was inserted into 
pZE2R-g//) to give pZE2K-SQ18 and the stop-codon-less 
gbs0031 CDS was fused to the ATG-less gfp gene from 
pXGlO, giving the pXGgbs0031::gf'p plasmid. Four 
TOPIO strains harboring pZE2K-SQ18 or pZE2R-null 
plasmids combined with pXGgbs0031::gfp or pXG-0 
plasmids were constructed. The expressions of the sRNA 
and the fusion mRNA were analyzed by quantitative 
RT-PCR and Western blot. Comparison of the relative 
levels of expression of gbs0031::gfp mRNA in pZE2R- 
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Figure 6. SQ18, SQ893 and SQ485 sRNAs controlled the gbsOOSl, 
ghsl263 and ghsl588 target genes expression, respectively. (A) Quan- 
titative real-time RT-PCR analysis of expression of gl:>s0031, ghsl263 
and ghsl588 gene. The relative expression of the three mRNA genes 
were determined by comparing over-expressing strains 5. agalactiae 
NEM316 + pTCV-SQ18 or pTCV-SQ485 or pfcV-SQ893 against the 
wild-type S. agalactiae NEM316 isolate. (B) Analysis by Western 
blot and quantitative RT-PCR of the expression of the gj'p and 
SQ18 gene expression in E. coli TOPIO strain harboring pZE2R- 
SQ18 or mock plasmids combined with pXG-0 (no gj'p target 
control) or pXGgh.'i0031::gj'p expression plasmids. SQ18 expression 
was normalized to 1.0 for the TOPIO + pZE2R-Se/S + pXG-0 strain. 
Asterisks indicate a significant difference between mean values in 
unpaired t tests (P<0.01). 



SQ18 and pZE2R-null bearing strains showed that SQ18 
over-expression was associated with a 4-fold decrease in 
the amount of the fusion mRNA (Figure 6B). Con- 
sistently, Western blot experiments carried out with 
antibodies directed against GFP (Gbs0031::Gfp) indicated 
a 2.6-fold decrease of the amount of Gfp fusion in the 
strain over-expressing SQ18 (Figure 6B). Thus, SQ18 is 
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a negative post-transcriptional antisense regulator of 
gbs0031::gfp gene activity when expressed in E. coli. 



DISCUSSION 

High-throughput sequencing of bacterial transcripts 
(RNA-seq) or tilling microarray experiments showed 
that sRNA gene diversity is far greater than expected 
(8,9,61,62). In particular, these data revealed the existence 
of mRNA and asRNA pairs transcribed from genes 
present at the same locus, but on opposite DNA 
strands. There is a growing interest in the analysis of bac- 
terial sRNAs in particular their contribution to gene regu- 
lation including the expression of virulence factors, but the 
identification of the full set of sRNA genes as performed 
by RNA-seq or tiUng microarray remains a difficult task 
and the experimental costs remain high. We have thus 
designed and validated a new in silico model that effi- 
ciently identifies sRNA genes, including asRNAs, in any 
bacterial genomes, including both IGR and CDS regions. 
Our analysis of genome sequences from ExPEC and 
S. agalactiae, two major human pathogens, predicted the 
existence of numerous sRNAs, including asRNAs 
co-localized with virulence-associated genes. 

Previous in silico methods for identifying de novo 
sRNAs in bacterial genomes increased in efficiency over 
time, but they are still limited for the analysis of IGR and 
do not predict asRNAs that partially or totally overlap 
neighboring CDS. Several sRNAs have been described in 
E. coli and other species (1), but few data are available for 
asRNAs (4,12,13). Our combination of RIT prediction, 
comparative genomics, RNA structure prediction with 
an implemented scoring system based on a RIT score 
and the analysis of covariations, identified ~1800 and 
~200 sRNA candidates for E. coli and S. agalactiae 
genomes, respectively. The mean efficiency of our in 
silico model, based on the analysis of six genomes and 
expressed as the percentage of predicted versus known 
sRNAs, was estimated to be 70.1% and 71.5% for 
sRNAs located in the IGR and asRNAs, respectively 
(Table 3) which suggests that it is an efficient tool for 
analyzing any bacterial genomes. Up to now, few innova- 
tive in silico models were able to identify asRNA genes. 
The corresponding algorithms, based on comparative 
genomic approaches or mathematical/statistical analyses 
of the RNA secondary structures, were validated only 
with E. coli genomes (20,23,24,25) and only a few 
asRNA candidates were identified. In addition, these 
tools were either unable to predict sRNA genes de novo 
(25) or lacked validation data supporting their use as 
reHable asRNAs finders (20,23,24). Our study suggests 
that our in silico model can predict asRNA genes fully 
transcribed from CDS regions in antisense and possibly 
in sense orientation. Recent RNA-seq data suggested the 
existence of sense sRNAs but no biological functions were 
identified to date (9). Globally, we identified here sRNA 
and asRNA candidates evenly distributed throughout 
the genome. Based on the recognition efficiency of 
known E. coli sRNAs (Table 3), our approach appears 
as reliable as all currently available algorithms. 



The main limitation of our approach is that it requires 
RIT prediction to detect sRNAs. We initially used RIT 
prediction to demonstrate that our in silico model effi- 
ciently identified known sRNAs in E. coli because 
72.3% of known sRNA genes located in IGRs have an 
RIT. As a consequence, sRNA genes that utilize atypical 
RITs or a different termination process were not predicted 
with our model. We had hypothesized that any protein 
binding sites in sRNA could be the starting point of 
our predictive model. Thus, identification of the Rho 
protein or the Hfq binding sites may be good alternatives 
to enhance our sRNAs prediction model especially as 
RNA-seq data for E. coli (10) and Salmonella species 
(62) showed that RIT seemed to be less frequent in 
asRNA genes (<~50%). On the other hand, we used 
two distinct RIT prediction models, which might exhibit 
variable predictive efficiencies for different bacteria. 
This approach is also limited by the number of fully 
sequenced genomes available and the requirement that 
the genetic divergence among these sequences be 
minimal to allow covariation identification. During our 
study, \5 E. coli and 3 S. agalactiae sequences were avail- 
able and the mutation frequency among the genomes 
within these two species was not the same. The sequence 
conservation among 5*. agalactiae strains was higher than 
it was for the E. coli strains. Thus, the different RIT pre- 
diction efficiencies obtained for these two bacteria may 
explain why we identified ten times more candidates in 
E. coli than S. agalactiae. 

The Hfq protein is the chaperone for sRNAs found in 
numerous bacterial species that is involved in the regula- 
tion of general cell metabolism and virulence (1,2,7). It 
has recently been shown that Hfq contributes to the viru- 
lence of E. coli strains causing urinary tract infection, 
a subgroup of the ExPEC pathotype suggesting that 
sRNAs have an important regulatory role on the expres- 
sion of ExPEC virulence (57). We analyzed multiple 
genome sequences of ExPEC strains which revealed that 
there is a set of sRNA genes specific to this pathotype. 
Species-specific sRNAs have been identified in other 
bacteria, such as 5". aureus (5) or 5. typhimurium (6), but 
they are mostly located in IGR and their distribution 
could not be often easily associated with a function and 
a degree of virulence. In particular, this is the case for the 
virulence associated sRNA genes Hke RNAIII (63) and 
SprD from 5. aureus (64) and FasX from S. pyogenes 
(65). In contrast, the identification of FimR, HlyR, and 
PrfR asRNAs in clusters of genes required for the patho- 
genesis of cystitis and pyelonephritis (50) suggested the 
possible association of these asRNAs with these 
pathologies as observed for the AmgR asRNA from 
5. enterica (66). In contrast, the Hfq-dependent FimR 
regulation constitutes a rare case of an asRNA acting as 
a positive regulator of gene expression, thus revealing the 
importance of this new asRNA function. However, the 
molecular mechanisms by which FimR regulates type 
1 fimbriae production is still a matter of debate despite 
the fact that it was extensively studied (11). Recent models 
of the post-transcriptional activation of coUagenase 
mRNAs by VR-RNA in Clostridia or of the 
streptokinase mRNA by FasX in Group A Streptococci 
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(67) provides insight into some of the possible mechanism 
of regulation by FimR asRNA. 

The control of expression of virulence genes during 
pathogenesis is critical for the opportunistic pathogen 
5. agalactiae. As only three complete genome sequences 
are currently available for the group B streptococci, 
the distribution of sRNA genes in this species remains 
largely unknown. We analyzed the genome sequence of 
the virulent strain NEM316 and identified 197 sRNA/ 
asRNA genes and vahdated the expression of 26 of 
them. One putative sRNAs previously reported to 
interact with the CiaRH regulatory system from 
5. agalactiae NEM316 has been also identified in our 
analyses (59). Distribution of sRNA genes was uniform 
along the S. agalactiae NEM316 genome including the 
core genome and PAls. Moreover, the location of sRNA 
genes in the PAI of S. agalactiae suggest that this may be a 
common feature in pathogenic bacteria as reported for 
S. aureus (5) and 5". typhiniurium (6). These observations 
indicated that pathogenesis of Group B Streptococci may 
be controlled by sRNAs, as demonstrated in Group A 
Streptococci (65,68,69). The regulatory roles of the 
SQ18, SQ485 and SQ893 sRNAs on adjacent mRNAs 
expression involved in virulence, as demonstrated in this 
study, provide additional support to this hypothesis. 
However, the role of sRNAs/asRNAs in the control of 
the virulence of Group B Streptococci remains to be 
characterized and our list of candidates may facilitate 
these studies. 

This report demonstrated that an sRNA gene finder 
approach can efficiently identify sRNAs located within 
IGRs, asRNAs and putative sense RNAs transcribed 
within CDSs. The main advantage of in silico approaches 
over in vivo techniques (tiling microarrays and RNA-seq) 
is the capabihty to search for sRNAs in an unlimited 
number of strains irrespective of their growing conditions. 
This catalog may then be used to select the most valuable 
strains for in vivo studies and should facihtate the 
post-screening identification of expressed sRNAs and 
asRNAs in large collections of data. Accordingly, the 
results of our analysis of the genomes of two major 
human pathogens, E. coli and S. agalactiae, suggest that 
sRNAs as weU as asRNAs are key elements in the control 
of their virulence. 
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